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Abstract. Exponential stability of the nonlinear filtering equation is revisited, when the 
signal is a finite state Markov chain. An asymptotic upper bound for the filtering error 
due to incorrect initial condition is derived in the case of slowly switching signal. 



1. Introduction and the main result 

Consider a discrete time Markov chain X = (X n ) n ^z + with values in a finite real alphabet 
§ = {ai, ...,ad}, initial distribution Vi = P{X$ = aj) and transition probabilities Xij = 
P(X n = aj\X n -\ = a^. Suppose that the chain is partially observed via the noisy sequence 
of random variables Y = (Y n ) ne z + , generated by 

d 

yn = Y, 1 iX n =a l }Ui), n>l, (1.1) 

1=1 

where £ = (£ n )n>i is a sequence of i.i.d. random vectors with independent entries £ n (z), 
i = 1, d, such that 



P(£i(i) efl) = f gi(u)ip(du), Be 
Jb 



with densities gi(u) and a a-finite reference measure ip{du). 

Let J^f = <j{Yi, ...,Y n } and ir n (i) = P(X n = at\^^). The vector n n of the conditional 
probabilities satisfies the recursive filtering equation 

G(y w )A*7r n _i 

|G(Y n )A*7r n _i| 

where G(y), y € R is a diagonal matrix with entries gi(y), A* is the transposed matrix of 
transition probabilities and \x\ = Yli=i \ x i\ ^ ov x ^ 

Suppose that (|1.2|) can be solved subject to a probability distribution v ^ v and denote 
the corresponding solution by 7f n . Under certain mild conditions (to be specified later) the 
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limit 

7:= lim -log|7r n -7f n |, P-a.s. 

n— >oo n 

exists and if it is negative the filter is said to be (exponentially) stable. The stability index 
7 is elusive for explicit calculation and much research focused recently on estimating 7 in 
various filtering settings (see 001311310113101 and others). In particular, Gaussian additive 
white noise model was considered in ^ (cf. Ql-l|) ) 

Y n = h(X n ) + arj n , n>l, m ~ Af(0,l) 
and the following asymptotic upper bound was derived 

l d 2 

lim cr 2 7(cr) < -- _^__fJ-i min (h(ai) - h(aj)) , (1.3) 
i=i 

where \x is the stationary distribution of the chain X, assumed to be ergodic. Recall that 
X is ergodic if fii = linin^oo P(X n = dj), i = l,...,d exist, are unique and positive, which 
holds iff A q has positive entries for some integer q > 1 (see e.g. |11))- 

In this note a different scaling of the problem is chosen, namely the slow chain limit of 
7 is considered. Let X^ be a Markov chain on S with transition probabilities 



I 1 i = J- 



Kj — P{ X n — a j\ X n-l — a U 



for an e G (0, 1). Notice that X s is an ergodic chain with the same invariant distribution 
H as X. Denote by Y 6 the corresponding observation sequence generated by with 
X replaced by X s and let 7r e , 7f e be the solutions of (|1.2j) subject to v, u, with Y and A 
replaced by Y e and A e . 

Theorem 1.1. Assume that X is ergodic and the noise densities gi{u) 

(al) are bounded 

(a2) have the same support 

(a3) and j R gi{u) log gj(u)ip(du) > —00, for all 
Then for any pair (z/, v) of probability distributions on S 

d 

7(e) < -V^min^( 5i || 9j ) +o(l), e 0, (1.4) 

where &{gi \\ gj) = fngi(u) log —(u)ip (du) are the Kullback-Leibler relative entropies. For 

_____ 9j 
d = 2 the asymptotic Q1-4JI is precise, i.e. 

j(e) = -pL 1 @(g 1 \\g 2 )-H 2 @(g_\\g 1 ) + o(l), e -» 0. (1.5) 

This theorem reveals the following interesting properties of 7(e) (see Figure 0). 
1. 7(e) may be discontinuous at e = 

7(0+) =Ej(e) < 7 (0) = 0, 

if at least one of the entropies 3>{gi || gj) is strictly positive. This means that for small 
e > the filter remains stable virtually with the same stability index as long as the chain is 
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not "frozen" completely, while the filter, corresponding to the limit chain X® = Xq, n > 1, 
may be unstable (e.g. when some but not all 5i(tt)'s coincide (p-a.s.). Such a behavior is 
not observed in the analogous "slowly varying" setting for the Kalman-Bucy filter, where 
the state space of the signal is continuous. 

Surprising as it may seem at first glance, this phe- 
nomenon is quite natural for signals with discrete 
state space and can be explained as follows. The dis- 
tance |7r^ — 7f^| never increases and tends to decrease 
exponentially fast whenever X £ resides in a state 
with distinct noise probability distribution. Since 
the average occupation time of this "synchronizing" 
state does not depend on e, the decay remains ex- 
ponential with nonzero average rate. The "dual" 
manifestation of this phenomenon is that the filter 
stability improves, when the signal-to-noise ratio is 
increased in the setting of H1.3|) (see HEP). 

2. As demonstrated in the following example, 
7(e) may have a maximum at some e* > or, in 
other words, stability may improve when the chain 
is slowed down! This provides yet another evidence 
against the false intuition, directly relating stability 

of the filter to ergodic properties of the signal (see an extended discussion of this issue in 
H3 |3]). The reason for such behavior stems from the delicate interplay between two sta- 
bilizing mechanisms: ergodicity of the signal and synchronizing effect of the observations. 
The first dominates the second for the faster chain, and vise versa when the chain is slow. 

Example 1.2. Consider the so called Binary Symmetric Channel (BSC) model, for which 
X n G {0, 1} is a symmetric chain with the jump probability A and Y n = (X n — £ n ) 2 , where 
£ is an i.i.d. {0, 1} binary sequence with P(£i = 1) = p £ (0, 1/2). Let X £ and Y 6 denote 
the "slow" instances as defined above. In this case more can be said about the convergence 
in H1.5JI (see the proof in Section |3] below), namely 




Figure 1. 7(e) for 
the BSC example 



7(e) > 
plog 



+ 



4A(log(2)-%)) 



eloge~ 1 (l + o(l)), e^O. 



(1.6) 



v 



P 



+ (1 -p) log 



1-p 

and h{p) = —plogp— (1 — p) log(l — p). On the 

p 

-00 as e 



where Q) v . 

1 — p 

other hand, 7(e) < log(l — 2eA) — > —00 as e — ► 1/(2A) (see e.g. Theorem 2.3 in P). Since 
the second term in the expansion of 7(e) in (|1.6|) is positive and by (|1.5|) 7(e) — > —S> p as 
e — > 0, one gets the qualitative behavior depicted in Figure ^ D 



2. The proof of Theorem II .11 

Hereafter the assumptions of Theorem 11.11 are in force and the following notations are 
used: probability measures on S are identified with (column) vectors in S d ~ l = {x £ M : 



z< > °. Etl x i = !}' M/) := Eti f(<H)m for / : S ^ M and ^ G S d ~\ fi{A) := fi(l {A} ) 
for ACS. For a random sequence Z = (Z n 



'n)neZ and m > k the notation ^f m ] 
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a{Zk, Z m } is used and j^jf := ^rf n i for brevity. Convergence of random sequences is 
understood in P-a.s. sense unless stated otherwise. 

The proof relies on the following idea from Recall that Tr n = p n /\Pn\, n > where 
p n is the solution of Zakai linear equation (#„ is obtained similarly) 

p n = G(Y n )A*p n -i, p = v. (2.1) 

Let p n A p n := ^(p n p n ~ P~nP n ) denote the exterior product of p n and p n - The elementary 
inequality 

\Pn A P n | [/£>„ A p n | 
- — - — - < lvr n — 7T n | < — - r 



iPnNPnl lPn||Pn| 

implies 

7:= lim -log |7r„ - Tt n \ = lim -log|p n Ap n | 

n— >oo ?7, n— >oo 77, 

- lim i log I /On I — hm -log|p n |. (2.2) 

n— >oo 77 n— >oo 77 

Since gi(u)'s are bounded, the limits in the right hand side exist by virtue of the Oseledec 
Multiplicative Ergodic Theorem (MET). Moreover, since (G(l^)A*) >1 are matrices with 
nonnegative entries, the Perron-Frobenius theorem implies 

lim — log|/9 n |= lim — log \p n \ := Ai, W, i> € tS^" 1 , 

n— >oo 77 n— >oo 77 

where Ai is the top Lyapunov exponent corresponding to (|2.1j) . Similarly MET implies 
linifj^oo ^ log |/3 n Ap n | < Ai + A2 and thus one concludes that 7 < A2 — Ai < 0, i.e. the filter 
stability index is controlled by the Lyapunov spectral gap of (|2.1[) . The reader is referred 
to pQ for further details. 

The statement of Theorem 11.11 follows from ([2.2)1 and asymptotic expressions derived in 
Lemmas 12.11 and 12.21 below. 

2.1. Asymptotic expression for Ai(e). 

Lemma 2.1. For any e > the Markov process (X e ,7r £ ) has a unique stationary invariant 
measure Ai £ . The top Lyapunov exponent is given by 

Ai(e)= / E( A£ H / 9i(y)log\G(y)A s *u\ip(dy)MUdu), (2.3) 
where M.%. is the ir-marginal of M. £ . For each Jj = {ai : S>(j}j || gi) = 0} 

UmJ(l {xeJ . } - u e ) 2 M e (dx,du) = (2.4) 



and in particular 



d ~ 

lim Ai(e) = YV / ^(y) log gi(y)(p{dy). (2.5) 
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Proof. The process (X £ ,tt £ ) is Markov and by (jalj) it is also Feller and thus at least one 
invariant measure M e exists. Its uniqueness can be deduced (as in Theorem 7.1 in [1]) 
from the stability property lim n ^oo |-7r^ — 7f^| =0, W, v E S^ - " 1 , which in turn holds under 
the assumption (ja2j) by the arguments used in the proof of Theorem 2.3 in (see also 
Theorem 4.1 in Concentration properties of have been studied in jSj, when all the 
noises are distinct, i.e. S>(gi \\ gj) > for all i ^ j, which is not necessarily the case here. 

Let X s be the stationary chain (i.e. Xq ~ fi) and tt £ the corresponding optimal filtering 
process, generated by (J1.2JI subject to ttq = /i. For an / : S — > K and n, m > (y e denotes 
the observations corresponding to X s ) 

E{f(X £ n+m )-^ n+m (f)) 2 = E(/(X£ + J - E(/(^ +m )|^ m )) 2 < 

e(/(X* + J - E(/fej|^ +ln+m] )) 2 4 E(/(X£) - E(/(X=)|^f 



E(/TO-^(/)) 



2 



where stationarity of (X £ ,Y e ) have been used in f. This means that the filtering error for 
the stationary signal does not increase with time. Then by uniqueness of Ai e for any fixed 
m > 



/ 



{f(x)-u(f)) 2 M e (dx,du) 



lim E(/(X«) - ^(/)) 2 < E(/TO - ^(/)) 2 . (2.6) 



Define 

22j=iNU.k=i9j(W 

and let = {X| = Xo, VA; < m}, the event that does not jump on [0, m]. Notice 
that on the set A £ m , the observation process is independent of e, namely 

d 

Yk=Y k ° = J2\x 0=ai M^ k = l,...,m. 

i=i 

Then by optimality of tt £ 

E(f(X E J - ^(/)) 2 < E(/fc) - ^(/)) 2 = 

El { ^ } (/(X )-^(/))VEl {nXAU (/fe)-^(/)) 2 < 

E(/(X ) - ^(/)) 2 + 4ci 2 max |/(a 4 )| 2 (l - P{M m )) — -> E (/(*„) - ^(/)) 2 

For /(x) := l{a;e^} the latter and Q2.6JI implies 

Em / (l {a;ei7j . } - ^ u £ ) 2 .M £ (<i:r,du) < E(/(X ) - £ m (/)) 2 > 0, 

e— >0 J — m^oo 
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where the convergence holds since {X G Jj} G = V n >i &n by definition of Jj and 

since 7r^(i), i = 1, are the optimal estimates of l|^ Q=a | given ^ 



Once the existence of ergodic stationary pair (X £ ,7r e ) is established 1 one may use it to 
realize the limit Ai by means of the approach due to H.Furstenberg and R.Khasminskii (see 
e.g. |10j). The idea is to study the growth rate of p £ n by projecting it on the unit sphere 
[S d ~ x in this case): 

K| = \G{Y^*pl_ x \ = l/dl G(Y*W*^ = l/CiHCTOA'X-il- 

\Pn-l\ 

Then by the law of large numbers (LLN) for ergodic processes (the required integrability 
conditions are provided by ifaTj) and I|a3|)) 

X 1 (e)= hm -log |^|= lim if^log|G(y n £ )A e X-i! = Elog|G(y 1 £ )A e X| = 

n—*oo n n— »oo 77 ' 1 11 1 

m=l 

d d 

E ^ l {Xf=ai} log A £ *7r £ | = P(Xf = log 1 67(6(0) A £ *^| = 



i=l i=l 



E^(A £ X) i log|G(ei(i))A £ *vrg|. (2.7) 



i=l 



The latter expression is nothing but (|2.H|) . The asymptotic (|2.5[) follows from A e = l+0(e) 
and the concentration (j2.4|) of .M e as e — ► 0, since <7i(it)'s coincide (/j-almost surely for all 
a-i G jZj for any j and the X-marginal of M. 6 is given by Af^- (da;) = Si=i Mi^aj (<te)- □ □ 

2.2. Asymptotic bound for Ai(e) + A2(e). 

Lemma 2.2. For any v,i> G <S d_1 



lim -log|/£ Ap^| < 

n— »oo 77 



d „ 

V^/ijmax / & (it) log (g m (u)g k (u))(p(du) + o(l), e 0. (2i 
r-f Mm Jr 



i=l 

In i/ie case d = 2 

lim - log |p* A = log(l - eAi2 - eA 2 i) + 

n— »oo 77, 

A*i / ffi(«)log (gi(u)g 2 (u))(p(du) + p 2 / log (gi(u)g2(u))<p(du). (2.9) 

JR JM 

Proof. The process := p e n F\p\ evolves in the space of antisymmetric matrices (with zero 
diagonal) and satisfies the linear equation 

Q £ n = G(Y*)A £ *Q s n _ 1 A £ G(Y*), Q £ = uAu, 

^such pair can be generated by taking both Xo and no randomly distributed according to M e and its 
definition can be extended to the negative times by the usual arguments. Note that this is different from 
(X e ,7r e ) used in the proof of M e concentration 
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or in the componentwise notation 

Q £ n (i,j)= Yl 9k{Y^\W £ n ^{k,l)\%g,{Y^), i^j. 
l<k^i<d 

Unlike in the case of 1)2.1(1 . it is not clear whether the limit lim ra _» 00 - log \Q^\ depends 
on v, v or 11^ = Q e n /\Q e n \ has any useful concentration properties as e — > 0. However the 
technique used in the previous section still gives the upper bound. With a fixed integer 
r > 1 

m^Q e n . r \\{G{Y £ W\.\G(Y £ _ r+l )K^ < 

n 

|QUl(E|nU(*\i)| II 9i{YMYl) + ci(r)e) < 

i=£j m=n—r+l 
n 

IQn-rl(max Yl 9i{Y^)gj{Y^) + ci(r)e) , n>r 

m=n—r+l 

with a constant ci(r) > 0, depending only on r (due to assumption Qal|)). By the MET the 
limit lim n ^ oc - log \ Q e n \ exists P-a.s and hence (recall the definitions of Y e and A% on page 
EJ) 

lim - log \Q £ n \= lim — log |Qf r | < 
Y 1 I kr 



fc=l m=kr— r+1 

1 T 

-Elog (max J] a(Y£)<&(i~) + Cl (r)e) < 

-El {A e } log (max J] (&(Y*)<&(i£) + <*(r)e) + c 2 (r)(l - P M (^)) < 
-J^Elog (max J] gi (UW)9j (UW) + ci(r)e) +c 3 (r)(l - P M (^)) 

^=1 m=l 
d 1 r 

V^Emax-log TT ft (£ m (^))#7 (UW) , 



i^j r 

t=l m=l 



where the LLN was used in f and Cj(r) stand for r-dependent constants. Applying the LLN 
once again one gets for each I 



T T 

-log J! 9i{U(i))9j{U(£)) = - E ^9i{U{f))9j^m{i)) 



r r 

m=l m=l 



9e(u) log (gi(u)gj(u))ip(du), P 



a.s. 



s 
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Since "max" is a continuous function 



max -log TT gi(£m(e))gj(£, m {£)) ^max / g t {u) log (gi(u)gj(u))<p(du) 
and by the uniform integrability, provided by assumption (JaHJ), 

Emax-log T[ 9i(€m(£))9j(tm(t)) ^max / g e (u)log (gi(u)gj(u))ip(du). 



^ 1 

m=l 

Putting all parts together one gets the bound (|2.8|) . In the case d = 2, the process is one 
dimensional and all the calculations can be carried out exactly, leading to the expression 

(JUJi. □ □ 

3. Proof of (fL6|) 

When the observation process Y £ takes values in a discrete alphabet §' = {&i, 

the conditional densities (with respect to the point measure tp(dy) = Yli=i ^h(dy)) are of 
the form 

d' d! 

9i(y) = ^PiA^ih = lj Pi i - °' 



and hence by (|2.7|) (7r^ := A £ *7Tq for brevity) 

d' d 
X 1 (s) = Elog|G(ir)A w 7rg| = E ]T l {yf=6 . } log ( X>^i|o(* 

i=i i=i 

d' 

Ej^P(i? = ^l^(-Uo]) lo g P (^ = & il^(- e oc,o]) =:-^ £ ), (3-1) 

where Jif(Y e ) is known as the entropy rate of the stationary process Y £ = (Y^) ne %. 

Consider now the special case, when X £ and Y £ take values in S = {0, 1} and p = P(Y £ = 
i\X^ = j) for i ^ j. The vector 7r £ is one dimensional and hence P(Yf = ll^Z^o]) = 

(1 - p)ti"i|o + P0- ~ ^llo)' where 



7Tf| := P{X{ = 11^,0]) = (1 - eA 10 K + ^ i(l - vrg) (3.2) 



and 7Tq := P(Xq = l^Y^ j) are redefined for brevity. 

Let h(x) := — x log x — (1 — x) log(l — x), x G [0, 1] and ^ p (<?) = (1 — p)q + p(l — g), and 
define 

H(p,q):=h(e p (q)) p,q€ [0,1], 

where OlogO = is understood. Since h(x) < log(2) with equality at x = 1/2 and £ p (l/2) = 
1/2, H(p,q) < log(2) for all p,q G [0, 1] with equality at q = 1/2. Since h(x) is a concave 
function, symmetric around x = 1/2 

H(p, q) = h((l - p)q + p(l - q)) > qh(l - p) + (1 - <?)%) = p G [0, 1], 
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with equality at q = and q = 1. Finally for any fixed p £ [0, 1], g h H(p,q) inherits 
concavity and symmetry from h(x). These properties imply the following lower bound 

\os(2) — h(v) 

H(p, q) > h{p) + ^ >— W min(g, 1 - q), p,g€[0,l]. (3.3) 

By Theorem 1 in jH] for the symmetric chain X £ with jump probability A and p ^ 1/2 
Emin(7TQ, 1 — 7Tq) = P(Xq / argmax7TQ(z)) = 

i 

A £ lo ge - 1 (l + (l)), e ^0, (3.4) 

p 1 — p 

where S> p := plog \- (1 — p) log . The expression for JifCY 6 ) in the case d = 2 

1 — p p 

reads 

JT(Y £ ) = EH(p, 7rf |0 ) = EH(p, Trg) + 0(e), e - 

where the latter asymptotic follows from ()3.2j) . since H(p,q) is differentiable in 5. 
Now (j3~3j) and (f^Ijl imply 

J?{Y £ ) > h{p) + 2 ( log(2) - hip)) A e i og e -i (1 + (i)) , £ ^ , 
and HU) follows from (|23j) . (1231) and (JHUJ. □ 
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